Fix backward rope after YaRN#3974
Merged
ggerganov merged 6 commits intoggml-org:masterfrom Nov 7, 2023
Merged
Conversation
rope backward process was broken after YaRN RoPE (ggml-org#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false.
it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants.
cebtenzzre
approved these changes
Nov 6, 2023
Collaborator
|
xpos is only implemented in ggml_compute_forward_rope_f32 - we should probably fix that. |
|
It's probably a good idea to rename function after that change to not only mention forward. Like |
ggerganov
approved these changes
Nov 7, 2023
olexiyb
pushed a commit
to Sanctum-AI/llama.cpp
that referenced
this pull request
Nov 23, 2023
* fix backward process of rope rope backward process was broken after YaRN RoPE (ggml-org#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The rope backward process was broken after YaRN RoPE (#2268) implementation, due to missing changes in the backward functions.
The code for the rope backward process is nearly identically to the forward process:
the only difference is the sign of the
sin-values.To avoid future regressions this PR removes the near-duplicate backward functions and reuses the forward code:
For this a new function argument
bool forwardwas added toggml_compute_forward_rope_f32andggml_compute_forward_rope_f16. The sin-values will be negated when forward is false.Additionally the rope parameters in
ggml_compute_backwardare now correctly propagated to the backward process.